Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SDK Router message handling #316

Merged
merged 21 commits into from
Sep 8, 2023
Merged

Add SDK Router message handling #316

merged 21 commits into from
Sep 8, 2023

Conversation

joshua-kim
Copy link
Contributor

@joshua-kim joshua-kim commented Sep 5, 2023

Why this should be merged

Adds the P2P SDK's router to handle incoming sdk messages

How this works

Forwards messages that fail unmarshaling against the codec into the SDK router

How this was tested

Added a unit test

@joshua-kim joshua-kim force-pushed the sdk-gossip-noop branch 5 times, most recently from b8dcf99 to 8943b79 Compare September 5, 2023 14:43
@joshua-kim joshua-kim mentioned this pull request Sep 5, 2023
peer/network.go Outdated Show resolved Hide resolved
peer/network.go Outdated Show resolved Hide resolved
peer/network.go Outdated Show resolved Hide resolved
peer/network.go Outdated Show resolved Hide resolved
Comment on lines +464 to +468
assert.ErrorIs(t, p2p.ErrUnrequestedResponse, clientNetwork.AppResponse(context.Background(), nodeID, requestID, gossipMsg))
assert.ErrorIs(t, p2p.ErrUnrequestedResponse, clientNetwork.AppResponse(context.Background(), nodeID, requestID, requestMessage))
assert.ErrorIs(t, p2p.ErrUnrequestedResponse, clientNetwork.AppResponse(context.Background(), nodeID, requestID, garbageResponse))
assert.ErrorIs(t, p2p.ErrUnrequestedResponse, clientNetwork.AppResponse(context.Background(), nodeID, requestID, emptyResponse))
assert.ErrorIs(t, p2p.ErrUnrequestedResponse, clientNetwork.AppResponse(context.Background(), nodeID, requestID, nilResponse))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was written to ensure that an invalid message NEVER triggers an unintentional fatal error, so it seems a bit weird to change it in this way.

Copy link
Contributor Author

@joshua-kim joshua-kim Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was unsure if it made more sense to just remove these test cases or just test for the sdk error. I guess now the property that we have is that invalid messages are always forwarded into the router, so we can either check for this error, get rid these tests, or write a Router interface that p2p.Router implements but maybe that's overkill.

StephenButtolph
StephenButtolph previously approved these changes Sep 7, 2023
peer/network.go Outdated Show resolved Hide resolved
@StephenButtolph StephenButtolph dismissed their stale review September 7, 2023 21:54

one more question

peer/network.go Outdated Show resolved Hide resolved
peer/network.go Outdated Show resolved Hide resolved
peer/network.go Outdated
@@ -365,15 +365,16 @@ func (n *network) AppRequest(ctx context.Context, nodeID ids.NodeID, requestID u
// If the response handler returns an error it is propagated as a fatal error.
func (n *network) AppResponse(ctx context.Context, nodeID ids.NodeID, requestID uint32, response []byte) error {
n.lock.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since closed is atomic, could we move this lock to directly before
handler, exists := n.markRequestFulfilled(requestID)?

Should we move it to inside markRequestFulfilled instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes much more sense than what I was currently doing.

peer/network.go Outdated
n.lock.Lock()
defer n.lock.Unlock()

func (n *network) AppResponse(ctx context.Context, nodeID ids.NodeID, requestID uint32, response []byte) error {
if n.closed.Get() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very subtle change. So we should be terrified to make it.

I believe we must hold the lock when checking for n.closed on all of the inbound response + requestFailed flows to avoid a potential panic due to a write to a closed channel (or closing a channel twice).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline I'll see if I can make a separate regression test for this invariant as a follow-up to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I documented the invariant on n.closed as well 👍

peer/network.go Outdated

if n.closed.Get() {
n.lock.Unlock()
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we okay with dropping responses (and timeouts) that should have been sent to the SDK router after we close the network?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not okay with this - then I think we'll need to be fairly careful around what gets passed through to the router.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted offline but agreed it was cleanest to leave the Router code as-is, and just drop the closed check on the server-end because we're guaranteed that the outstanding request is empty on shutdown because we empty it on Shutdown and stop sending requests after the flag is set.

peer/network.go Outdated Show resolved Hide resolved
@StephenButtolph StephenButtolph merged commit d4e7f3a into master Sep 8, 2023
9 checks passed
@StephenButtolph StephenButtolph deleted the sdk-gossip-noop branch September 8, 2023 18:41
darioush pushed a commit that referenced this pull request Apr 25, 2024
* Add SDK Router message handling (#316)

Co-authored-by: Stephen Buttolph <[email protected]>

* revert avago version bump

* Fix hanging requests after Shutdown (#326) (#859)

* Fix hanging requests after Shutdown (#326)

* fix requests hanging after shutdown

* fix build

---------

Signed-off-by: Stephen Buttolph <[email protected]>
Co-authored-by: Stephen Buttolph <[email protected]>

* Bump avago rc (#860)

* Update to 1.10.10-rc.2 (#328)

* update to avalanchego 1.10.10-rc.2

* nits

* nit

* add batchsize

* increase timeout dynamically

---------

Co-authored-by: Joshua Kim <[email protected]>

---------

---------
darioush pushed a commit that referenced this pull request Apr 25, 2024
* Add SDK Router message handling (#316)

Co-authored-by: Stephen Buttolph <[email protected]>

* Fix hanging requests after Shutdown (#326)

* fix requests hanging after shutdown

* fix build

---------

Signed-off-by: Stephen Buttolph <[email protected]>
Co-authored-by: Stephen Buttolph <[email protected]>

* Update to 1.10.10-rc.2 (#328)

* update to avalanchego 1.10.10-rc.2

* nits

* nit

* Add P2P SDK Pull Gossip (#318)

* add batchsize

* sync changes

* Drop outbound gossip for non vdrs (#862)

* Drop outbound gossip requests for non-validators (#334)

* drop outbound gossip requests for non validators

* nit

* nit

* sync changes

---------

Co-authored-by: Joshua Kim <[email protected]>

---------

Co-authored-by: Joshua Kim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants